Talk by Miguel Á. Carreira-Perpiñán from UC Merced. Given to the Redwood Center for Theoretical Neuroscience at UC Berkeley.
One fundamental class of dimensionality reduction algorithms are nonlinear embedding (or manifold learning) algorithms. Their goal is to find meaningful low-dimensional coordinates for a collection of N objects, such as images, for which pairwise similarities or distances are known. Their roots are in multidimensional scaling (MDS) algorithms developed in the psychology literature before the 1950s, and have seen widespread interest and application in machine learning in the last 20 years.
Embedding algorithms define an objective function of the low-dimensional coordinates of the N data objects, possibly subject to some constraints. When this objective is quadratic, it defines a spectral problem whose global solution is given by the extremal eigenvectors of a certain matrix of NxN. This includes algorithms such as classical MDS, Isomap, LLE and Laplacian eigenmaps. When it is nonlinear, numerical optimisation is necessary. This includes algorithms such as stochastic neighbor embedding, t-SNE and the elastic embedding.
I will give a brief introduction to spectral and nonlinear embedding algorithms and then discuss optimisation approaches for both of them, with particular attention to large-scale problems. For spectral methods, these include landmark-based methods such as the Nyström formula, column sampling, locally linear landmarks (LLL) and variational Nyström, which are based on solving a reduced eigenproblem on a small set of landmark objects and then extrapolating the solution to all N data objects. For nonlinear methods, I will describe the spectral direction method, which uses a sparse positive definite Hessian approximation to "bend" the true gradient with the curvature of the spectral part of the objective. I will also describe the use of fast multipole methods, which allow us to scale the optimisation to millions of objects, by approximating the gradient (which is quadratic on N if computed exactly) in O(N).
Finally, I will discuss the out-of-sample mapping problem for embeddings, that is, how to define a mapping that we can use to find the low-dimensional projection of points not in the training set. I will describe two approaches: a nonparametric one, which makes no model assumptions, and illustrate it with the Laplacian Eigenmaps Latent Variable Model; and a parametric one, which uses a parametric mapping such as a neural net. For the latter, I will show how to construct an optimisation algorithm that learns the out-of-sample mapping from the training set by using auxiliary coordinates. The algorithm is universal in that it can be easily adapted for any combination of embedding objective function and parametric mapping.